# ultimate programming language - WIP

This is a draft of an idea for the ultimate, ideal programming language.

## vs C

Let's start with C, a language that's pretty suckless already.

What's seems bad about C:

- preprocessor/macros form a language different from the base language, it's
  basically two languages in one (bad)
- non-minimal syntax, each syntactic structure has arbitrary syntactic rules
  (e.g. ; needs to follow an array initialization { } block but not a function
  { } body)
- English keywords (instead of language neutral single symbols)
- expressions as a syntactic speciality, operators are syntactically different
  from function calls, have arbitrary precedence rules etc.
- has non-intuitive and confusing things for non-programmers, e.g. = vs ==
- non-uniform data type sizes (int size depends on platform)
- lots of undefined behavior (but to some degree this is prolly needed for
  performance etc.), some if unintuitive (e.g. bit shift by more than data type
  width, one would expect it's a 0 but it's undefined)
- unnecessary features: enums, floats, doubles
- is just one non-configurable language (could be parametrized and form a
  family of languages)
- no binary literals
- ...

## basic philosophy

The basic **goals** of the language are:

- To be extremely **intuitive**.
- To be extremely **elegant**.
- To be extremely **readable**, with **short code**, in **both inline and long source code** formats.
- To be **general purpose**.
- To **not be based on any natural human language** (such as English), but rather on mathematical notation.
- Support at least **imperative** and **functional** paradigms, both any one alone as well as their combination.
- Support **compilation** and **interpretation** alike.
- Support **high level** and **low level** programming alike.
- Be naturally **extensible** without this support adding complexity to the language.
- Respect **suckless**, **KISS**, **minimal** and **Unix** philosophies.
- Have small and **simple specification**, **syntax** and **as few rules and concepts as possible**, with **as few exceptions as possible**.
- Be **simple to analyze** and allow very **simple implementation**.
- Be **easy to learn**.
- The reference implementation should be **public domain free software**, as well as accompanying materials (specification, documentation etc.).

The implied rules are:

- **Keywords must be single ASCII special symbols** (`@`, `!`, `:` etc.).
- **No package managers**, libraries and programs are not packages but simply files.
- **No Unicode** support, only 7 bit ASCII.
- **No boilerplate or bloat**.
- No declarations.

The priorities are **not**:

- To be highly productive.
- To be highly popular.
- To be highly safe.
- To achieve high performance.
- To offer many features.
- To offer a big standard library.
- To enforce any programming style.
- To conform to industry or capitalism needs.
- To be similar to other languages, based on them or be compatible with them.
- To support OOP.
- To be backward compatible.


# drafts

Perhaps use brackets for both if branching and while cycle, like in Brainfuck. E.g.:

```
a == a   // pushes 1 on stack
(        // pops stack, if 1, continues on, else jumps after amatching endbracket

c == c   // pushes on stack 
)        // pops stack, if 1, jumps back to matching start bracket, else continues 
```

notation comparison:


| name                  | example                             | `FAVSRIMGC` (pros)   |
| --------------------- | ----------------------------------- | -------------------- |
| C                     | `10 * sin(PI - f(x,y)) + 1`         | `222120002 = 11 (3)` |
| lisp                  | `(+ (* 10 (sin (- PI (f x y)))) 1)` | `022011122 = 11 (5)` |
| postfix plain         | `10 PI x y f - sin * 1 +`           | `000212222 = 11 (9)` |
| prefix plain          | `+ * 10 sin - PI f x y 1`           | `000211222 = 10 (8)` |
| postfix upper. vars   | `10 PI X Y f - sin * 1 +`           | `002212220 = 11 (9)` |
| postfix prefixed vars | `10 $PI $x $y f - sin * 1 +`        | `102112122 = 12 (7)` |
| prefix terminators    | `+ * 10 sin - PI; f x; y;;;;; 1;`   | `111001122 = 9  (4)` |

pros (suckless attributes, the number in the bracket, are SRIMG):

- F: looks familiar, intuitive and similar to commonly used math notation and other programming languages
- A: arities of functions don't have to be known for evaluation, i.e. it is clear how many parameters each function/operator has
- V: functions and variables are distinguishable just from the expression alone
- S: the expression can usually be fairly short
- R: is well readable even without highlighters
- I: simple implementation
- M: uses minimum of extra symbols (brackets, commas, ...), i.e. doesn't "waste" characters
- G: grammar is consistent, i.e. has (almost) no exceptions and irregularities (such as some operators being infix, function calls differing from operator use etc.)
- C: can be case insensitive

```
myfunc a b c:  # comment goes either until \n or another #
  +(a -(b c))

@ < (i 10)
  print(i)
  i: -(i 1)
;

(& (>(x 10) <(x 15))) ?
  (print x)
~
  (print "no")
;



(@
  (> i 10)
  (myfunc a b c)
  (myfunc d e f)
)

(?
  (= x 50)
  (print "sasasa")
  ~
  (print "eeeee")
)

THIS SEEMS GOOD:

if (cond right after "(") :
  (
    cond ?
    ...
  )

while (cond right before ")"):
  (
    ...
    cond ?
  )
```


TODO: optional program parameters at the start of the file? such as extensions

TODO: lazy evaluation? likely not, but if and a loop have to be exceptions, because we don't want both branches of if be executed.

TODO: short condition evaluation? (e.g. a != 0 & b / a)

TODO: types? arrays? structs?

Every command is a function (returns a value). This function can however have side effects and doesn't have to be a pure function (for same input always return same output).

Function returns the value returned by its last statement


# another attempt

----

IDEAS: 
  - macros should use the same syntax (and semantics?) as the language itself, perhaps macros would mean two passes or running the program: first would generate the source, the second would generate the program
  - possible feature: embed source code to the output executable, pro-free-sw

----

Let's call the project A. The project doesn't just define one language, but a whole family and hierarchy of languages.

A specifies multiple languages at different levels.

This has many advantages. we can e.g. fork and derive multiple versions of a language from a common ancestor language, drop-in another ancestor language for an existing language, we can reuse existing parsers, have data-description languages use the same syntax as the programming language itself, which can in turn allow the programming language to easily parse programs written in the programming language itself... etc.

## level 0

Languages at this level are called A0, followed by 2 parameters:

  - left bracket symbol, referenced as **A0-(**
  - right bracket symbol, referenced as **A0-)**
  
E.g. A0() or A0[].

Simple language for capturing nested bracket expressions.

**A0-alphabet** is the alphabet of the language, and is identical to the 7bit ACII. (The language is case-sensitive.) A **white character** is any character with decimal value less than 33 or equal to 127.

**A0-name** is a sequence of characters that aren't white, A0-(, A0-), `"` and `#`.

**A0-string** is a sequence of characters that aren't `"`, starting with `"` and ending with `"`.

**A0-comment** is a sequence not inside A0-string, of characters that aren't `#` and `\n`, starting with `#` and ending with either `#` or `\n`.

**A0-atom** is either A0-name or A0-string.

**A0-sequence** is a sequence composed of 0 to many A0-atoms and/or A0-brackets, called **elements**, separated by one or more white characters, possibly left and/or right padded with white characters.

**A0-bracket** is a A0-sequence with A0-( in front of it and A0-) after it, possibly left and/or right padded with white characters. An **element** of a bracket means an element of its sequence.

Given A0 language is any A0-sequence.

Example of A0() language sentence:

```
hello      # this is a comment
(
  (() 323 - XYZ)
  123
  (
    (abc def ghi)
  )
)
```

## level 1

Each of these languages is a subset of some A0 language. Its name starts with A1 and is followed by these parameters:

- first parameter to the parent A0 language
- second parameter to the parent A0 language
- `F` or `f` (supporting or not supporting functions, respectively)
- `J` or `j` (supporting or not supporting jumps, respectively)

E.g. A1()Fg. Languages at this level capture control flow of an algorithm (ignoring other details such as operators or data types).

**A1-sequence** is A0-sequence of which no element is `?`, `@`, `'`, `~`, `^` and ``` (ASCII value 96).

**A1-branch** is A0-bracket whose first element is `?`, which is followed by an element called **A1-condition**, which is followed by **A1-sequence** called **A1-then**, which may or may not be followed by `~`, which may or may not be followed by **A1-sequence** called **A1-else**. This represents the if-then and if-then-else structures.

**A1-loop** is A0-bracket whose first element is `@`, which is followed by an element called **A1-condition**, which is followed by **A1-sequence**.

**A1-function** is A0-bracket whose first element is `'`, which is followed by an element, which is followed by **A1-sequence**.

**A1-label** is A0-bracket with two elements, first one being ``` (ASCII value 96) and second one being A0-name.

**A1-jump** is A0-bracket with two elements, first one being `^` and second one being A0-name.

**A1-call** is A0-bracket whose sequence is A1-sequence and which has at least one element. The sequence starting with second element up until the end is called **A1-arguments**. This represents a function call (either user defined or otherwise available, e.g. as built-in or via library).

Given A1 language is a A1-sequence. If the language is parametrized with `f`, no A1-function structure is allowed to appear. If the language is parametrized with `j`, no A1-jump structure is allowed to appear.

Example of A1()FJ sentence:

```
(' func1   # my function
  (` mylabel)
  (? (= x 1)
    (somefunc a b c)
    (func2)
  )
)

(' func2
  (^ mylabel)
  (+ 3 2)
  ("asasa" abcd) (3) (c)
)

(func 2)

```

## level 2

Each of these languages is a subset of some of A1 language, and their name starts with A2 (followed by possible parameters). These languages define additional syntactic and semantic elements (macros, data types, operators, ...) needed for a complete programming language.




### data types

An **SDT** (simple data type) defines a set of integers. Any SDT is described by two pieces of as SDT(w,x):

- *w*: bit width, a positive non-zero integer
- *x*: signedness, either signed (s) or unsigned (u)

The set of numbers defined by SDT(w,x) contains 2^w numbers, and these are all integers between 0 and 2^w - 1 (including the bounds) if x = u, or -2^w / 2 to 2^w / 2 - 1 (including the bounds) if x = s.

Representation of a value of given SDT is not defined, i.e. it is NOT guaranteed that the value will be encoded as two's complement or have specific endianity.

Sometimes we may need to compare SDTs, so let's define a linear ordering: SDTa(wa,xa) is greater than or equal to SDTb(wb,xb) if and only if 

1. wa > wb, or 
2. wa = wb and wa = s

Values can be converted between SDTs, which is called **casting**. The conversion is as follows:

TODO

A **CDT** (complex data type) is a finite sequence of SDTs: SDT1, SDT2, ... SDTn. A **complex value** is a specific value a CDT can take, it is a sequence of values of SDT1, ... SDTn.



TODO: pointer type

TODO: size of data type representation?

### operators

**Operators** are basic mathematical and logical operations (performed on values called **operands**) that often correspond to single instructions in machine code. Operators behave like built-in functions, i.e. they are used in the same way as function calls, but their internal implementation may be (and usually should be, for performance sake) special, i.e. an addition operator should, if possible, translate to a single addition instruction as opposed to a call instruction.

The general rule of all operators is that the operation is performed in the space of (and limited by) the greatest SDT of the operands. The general algorithm of operation is as follows:

  1. Let T be the greatest SDT of all operands.
  2. Cast all operands to T.
  3. Perform the effect of the operator, cast the result to T and return it.

The operator call therefore has SDT identical to the greatest SDT of its operands.

For example when multiplying a value x of SDT(16,u) with value y of SDT(8,s), y is casted to SDT(16,u), then 16-bit unsigned multiplication is performed yielding a 16-but unsigned result (with possible overflow), which is then converted to the greatest SDT of the language and returned.

The language operators are:

- (+,a,b): Addition, returns a + b with possible wrap around (overflow) over the upper bound.
- (-,a,b): Substraction, returns a - b with possible wrap around (underflow) under the lower bound.
- (*,a,b): Multiplication, performs a * b. Given the SDT of a and b is SDT(w,x), the multiplication is performed in space SDT(2 * w,x) and cast back to SDT(w,x) to give the result.
- (/,a,b): Integer division, performs a / b. If b is 0, program should halt with a zero division error.
- (%,a,b): Integer division remainder (modulus), e.g. 6 % 5 = 1, -3 % 2 = -1 etc. If b is 0, the program should behave the same as in case of dividing by zero with /.
- (=,a,b): Comparison, the result is 1 if the integer values represented by a and b are equal, otherwise 0. 
- (|,a,b): Logical OR, the result is 1 if at least one of the operands is non0, otherwise 0.
- (&,a,b): Logical AND, the result is 1 if at least one of the operands is non0, otherwise 0.
- (!,a): Logical NOT, the result is 1 if a is 0, otherwise 1.

TODO: bitwises

## specific languages

These are final programming languages of which each is a subset of some A2 language.

### macro language

This is a simple interpreted language meant as a helper language to be embedded in another language. It should be very easy to implement. Features:

- no complex data types
- no recursion

### scripting language

This is an interpreted language that can be embedded in programs and allow their scripting. This language is also easier to implement than the compiled language. Features:

- dynamic typing, variables can change type at run time
- no variable declaration, variable is created by assigning a value to it
- only one sufficiently large SDT is used for all simple values
- associative arrays (arrays can be indexed by strings)

### compiled language

This is the more complex language meant to be compiled to binary forms or to other compiled languages such as C. Features:

- static typing


